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ABSTRACT 

Fixed-width confidence intervals for a population 
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FIXED-WIDTH CONFIDENCE INTERVALS IN LINEAR REGRESSION WITH 



APPLICATIONS TO THE JOHNSON-NEYMAN TECHNIQUE 
Abstract 

Fixed-width confidence intervals for a population regression line 
over a finite interval of x have recently been derived by Gafarian. 
The method is extended to provide fixed-width confidence intervals for 
the difference between two population regression lines , resulting in a 
simple procedure analogous to the Johns on~Neyman technique. 



FIXED-WIDTH CONFIDENCE INTERVALS IN LINEAR REGRESSION WITH 



APPLICATIONS TO THE J OHHSOH-NEYMAH TECHNIQUE 
1. SUMMARY AND INTRODUCTION 

In the simple linear regression model, the length of the confidence 
interval for a predicted y at a given x depends on x , increasing as 
x departs from x . Simultaneous confidence intervals for the predicted 
y for all x are given by the well-known Working-Hotelling [1929] hyper- 
bolic band (Miller, 1966, p. 111). Gafarian [1964] has recently derived 
simultaneous fixed-width confidence intervals for the predicted y , for 
all x in a finite range centered at x , and has provided tables for 
the calculation of these intervals . 

' An important application of the Working-Hotelling confidence band 
is in two-group analysis of covariance with a single covariate, when the 
two population regression lines are not parallel. The Johns on -Neyman 
[1936] technique, as modified by Potthoff [1964], for locating the values 
of x for which a significant difference can be asserted between the 
population regression lines, is a simple application of the Working- 
Hotelling procedure . 

The fixed-width confidence interval procedure of lafaria n may also 
be applied to the above analysis of covariance model, providing a different 
and simple procedure for locating the values of x for which a significant 
difference can be asserted. Gafarian ? s tables, with slight modifications; 
may be used in this procedure. 
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Conclusions are drawn about the relative merits of the two procedures, 
and an example is given. 

2. THE WORKING -HOTELLING AND GAFARIAN BANDS 

For the linear regression model 

y = xg + C 



where 



X' = 



1 1 



>1 x 2 . 



> §* - p x ) > y* ~ (yi>***.»y n ) 



and e ~ N(0,o I ) , the (unbiased) maximum likelihood estimates of (3 and 
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cr are 



8 = (X'X) -1 X'y = S'-Sc'y , 

<j 2 = (y'y - y'x8)/(n - 2) , 

and the covariance matrix of 8 q; 3-^ is 
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k 100(l - Ct)f 1 confidence interval for [3^ = >: 0 > vrtlere 



Xq = (l Xq) , when is specified, is 



gi y G Si y + ory'S~*^y 

» *0 c 2 to ± t n -2 °^0 s -O' 



If simultaneous confidence intervals for all x with confidence coefficient 
100(1 - a)fo are required, then t*"*' 2 is replaced by (2F£ n 2 ^^ > 
producing the Working-Hotelling confidence band for the entire regression 
line: 



P'x e f3 : x + {2Fp ) ^(x'S^x) 1 / 2 

(see Miller, 1 966, p. Ill, for further details). 

In all practical cases, however, we will be interested in the 
regression equation over a finite range of x , say a < x < b . For such 
a range, the Working-Hotelling band will have a confidence coefficient 
greater than 100(1 - a)$>, so this band is wastefully wide. 

Gafarian [1964] derived fixed-width simultaneous confidence intervals 
over a finite range by the following argument. If the linear model is 
reparametrized as 

y = x*7 + e 

where 
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X*' = 



3\ -I “ X X<-j ■ X • • • X “X 

L 1 2 n 



then o is as before, and 



?’ -(y,&) , 



while the covariance matrix of , y is 



a^ 1 = d 2 (X*«X*) _1 = a 2 
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Z(x. - x ) 2 
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= 0 



Then 



- 7 0 )/s 0 o = >/5T (7 0 - 7 0 )/a 



and 



(?i - r 1 )/s 1 o 



are independent standard normal variables, and hence 

T o * (5=0 - V/V ’ T 1 - »1 - 

have a bivariate t -distribution with n - 2 degrees of freedom 
(see Dunnett & Sobel, 1954, Press 1972, p. 127). Then 
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P{ | (7 - y)'x*\ < 5o , V x e (a,b)) 

= P( I s 0 T q + s 1 (x - < 6 , V x e (a,b)) 

= P(|t q + s x (x - x^/sj < b/s Q , V x e (a,b)3 

and this probability is given by the integral of the bivariate t -density 
over a parallelogram in the (t^jt ) plane. Gafarian tabulates these 
probabilities for the special case in which the interval (a,b) is symmetric 
about x , i.e., a = x - h , b = x + h . In this case the probabilities 
reduce to a multiple of the integral of the density over the triangular 
region with vertices (0,0) , (0,6/s.jh) , (5 /sq, 0) . The probabilities 
.are tabulated as a function of n , c = s^/hs^ and d = 5 /sq * Given 
n , c and the required confidence coefficient, d and hence 5 may be 
obtained. The bivariate t -density has not been integrated over more 
general parallelograms. 

5- APPLICATION TO ANALYSIS OF COVARIANCE 



Consider the two-sample analysis of covariance model with a single 
covariate, when the regression lines are not parallel. Let 

y = X§ + e 

where 



X' = 



1 ... 1 



X~ • • • X 

2 n n 
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and 



c ~ N (O . o I ) 
’ n 



This model represents the two regressions 



h - p o - Vi 



i = 1, . . .,n. 
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y i = P 0 + ^2 + (f3 l + P 5 )x i 



i = n 1 + 1, . 



Unbiased maximum likelihood estimates are again 



g = S -Scy , 



a = (y’y - y'Xg)/(n - 2) , 



and it is easily verified that the covariance matrix of , c 



where 



aV 1 = a 2 (S l11 ) = o 2 
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so that, Tor example, 



,33 



1 



n. 



- \2 



n 

2 



Z (x . - x. ) . , , 

. ' i V i=n,+l 

i=l 1 



<»i - *4 



where is the mean of x for the j -th group ( j e 1,2 ). The 

distance between the population regression lines at x = x^ is ft-, + (3 ^Xq , 
and a 100 (l - a)$> confidence interval for this distance, when Xq is 
specified, is 

p 2 + P 5 x 0 e ft, + S 5 x 0 i t“/ 2 9{S 22 ♦ 2x 0 S 25 + X 2 S 33 ) 1/2 . 

If simultaneous confidence intervals for all x are desired with confidence 
coefficient 100(l - a)$, then t^ 2 is replaced by (2^ (see 

Potthoff, 1964) , and we obtain 

xcp 9 + iyc + {21^ o{S 22 + 2xS 25 + xV 5 ) 1 / 2 



P 2 + P x c P 2 



2, n-4. 



Again, in practical cases, our interest will be limited to a finite 
range of x . We now extend Gafarian* s results to this case. First 
reparametrize the model as 
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v/here 



X*' = 



k x ~ k ••• x ■ k 
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where k is to be determined so that ? 2 and 9^ are independently 
distributed. The covariance matrix of the y . is now 
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and similarly for . Thus 



cov(y 2 ,7 5 ) = -o c 
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x 2 - k 



n l n _ ? 

2 (x i ' i-„\l (Xl ‘ * 2> 



10 



w 



L 



o 

ERIC 



and this is zero when 



n n l n l 

k -- t3i 1 z (x. - i 2 f + i s z (* 4 - if )/{ T (x - » f 

i=n^-*l i=l i=l 

+ . < x i - 5 2 ) 2 ) • 
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For this value of k , we have 



,/N V 2, 1 . 1 

var( 7 9 ) = a ( — + — 
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Then as in Section 2, 



(7 2 - 7 2 )/s 2 ° and (^3 " yyhf 



are independent standard normal variates, and hence 
T 2 = (?2 " 7 2^ S 2 S and T ? = ^3 “ 

have a bivariate t -distribution with n - ^ degrees of freedom. Hence 
P(l 9 2 - y 2 + ^3 ■ 7 3 ^ x - k)| < 50 , ¥ x e (a,b)) 

K P f|T 2 + s 3 (x - k)T 5 /s 2 | < 5 /s 2 , V x £ (a,b)J . 
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Provided that (a,b) is symmetric about k , i.e., a = k - h , b = k J - h , 
these probabilities may be read directly from Gafarian's tables, with the 
sample size minus 2 used for Gafarian's n , c = s,,h/s 2 , and d = 5/s 2 • 
Again given n , c and the required confidence coefficient, d and 
hence S may be obtained. The confidence band is then 

7 ^ + 7 5 (x “ k) = P 2 + PjX g p 2 + + 5o , x e (k - h, k + h) 

The inconvenient requirement of symmetry about k results from the 
limited tables available. 

h. RELATION TO THE JOHNSON -NEYMAN TECHNIQUE 



The Johnson -Neyman technique is essentially the procedure for finding 
those values of x for which the Working -Ho telling simultaneous confidence 
intervals for the difference between the population regression lines do not 
include zero, so that a significant difference between the population means 
may be asserted for these values of x . The region of nonsignificance 
may be defined as consisting of those values of x for which zero is 
included in the confidence interval, i.e., for which 



(P 2 + f^x) 2 < 2F a 



2,n- 



a.2, 22 

a {S 



+ 2xS 



23 



xV 5 ) 



or 



Q(x) = q 2 x + 2q ] x + q Q 

= ((3 2 - Aa 2 S 55 )x 2 + 2(B 2 B 5 - Aa 2 S 25 )x + ((3 2 - A^S 22 ) < 0 



. _ a 

where A = 2F 0 , If the roots of 

2, n-4 

region of nonsignificance is empty if 

if q 0 < 0 . If the roots x^ < x 
d y 4> 



Q(x) = 0 

<l 2 > 0 > 

are real, 
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significance is >:„ < x < x if q_ > 0 , and is x > x , x < if 
q 2 < 0 . 

For the extension of the Gafarian technique; the region of nonsignifi- 
cance consists of those x for which 

P 2 + ^5 X " So < 0 < (3g + P^x + 6a ; 

i.e. ; 

-p 2 - 6a -p 2 + 6& 

7 < x < 7 

P 3 P 3 

if > 0 ; and 

-&2 + -p>2 ~ So 

< x < 

A A 

p 5 p 3 

if < 0 . If both these values of x fall outside the interval (k + h) , 
on opposite sides of k , then the entire interval is a region of nonsignifi- 
cance. If both fall outside; but on the same side of k , the region of 
nonsignificance is empty. 

5. EXAMPLE AND DISCUSSION 

We illustrate the two procedures with an example from Walker and Lev 
[1953, p. 402]. For two groups n 1 = 8 and n g = 10 slow readers, a pre- 
test reading gain score x and a test gain score y were obtained. The 
first group received play therapy, the second did not. The data required are 

n i 2 

Group 1 = 8 x x = 0.03125 , E (x. - x x ) = 0.^197 , 

y = 0.9675 + 1.4-4-Olx 
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- 2 

Group 2 rip = 10 x 2 = "0*1900 , E (x. - x = O.6718 , 

V 1 

y = 0.2063 - 0.088lx 
o 2 = 0.1091 0 = 0.3503 

A /N 

The variances and covariance of raay stained by substituting in 

2 -1 

a S , but they are most easily obtained by running the four variable 
equation through a multiple regression program which outputs the covariance 
matrix of the regression coefficients. By either method, 

o 2 S 22 = 0.03065 , a 2 S 25 = 0.02272 , a 2 S 55 = 0.42219 . 

For the Johnson-Neyman technique, the quadratic Q(x) = 0 becomes, 
taking A = 2F2°J 4 = 7-48 , 

-0.8226x 2 + I.9866X + 0.3052 = 0 

with roots -O.165 and 2.580. Since q^ < 0 , the region of nonsignificance 
is x < -O.165 , x > 2.580 (this differs from the result in Walker and Lev, 
where the t critical value is used instead of F ) . 

For the Gafarian method, we must first decide on the range of x 
over which the confidence interval is desired. The extreme values of x 
in the original data are -0.6 and +0.35; while 

k = {(0.03125)(0.67l8) - (0. 19) (0. 4197 ) }/{ 0. 4197 +O.6718) 

= -0.05382 
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We choose an interval symmetric about k which just includes the upper 
value of x , i.e., we take h = 0.39 (this will not cover the lower 
value of x ; h would have to be increased to 0.55 to do this). Then 



2 1 . 1 , 

s 2 = 8” + To + 

s 2 = 0.519^7 

2 _ 1 
S 3 “ 0.4197 

s^ = 1.96753 



(0.22125) 2 
1.0915 



O.26985 



1 

O.6718 



3.87119 



c = 1.4772 



From Gafarian's table, for c = 1.5 , n - 2 = 16 , 1 - CL - .95 , 
d = 2.975 by interpolation, while for c = 1.4 , n - 2 = 16 , 

1 - a = .95 > d = 3.050 by interpolation. Linear interpolation gives 
d = 2.992 for c = 1.4772 • Hence 5 = ds 2 = 1*55^ , and the region of 
nonsignificance is given by 



-.7612 - (1.55*0 (.3303) . . -.7612 + (1.554) (.3303) 

1.5282 1.5282 



i.e. 



} 



-0.834 < x < -0.162 




Note that the lower limit of x lies beyond k - h , so the region of 
nonsignificance is -0.444 < x < -0.l62 . The practical conclusions are 
identical from the two methods. For x > -0.162 (Gafarian) or -O.165 
(johnson-Neyman), the therapy group is superior to the nontherapy group. 
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The Gafarian method restricts such conclusions to the range (-0.444, 

0. 336), while the Johns on-Neyman method finds another nonsignificance 
region for x > 2.580 . Since this is far beyond the range of the data, 
it is of no practical interest. 

. i 

' It should be noted that the Gafarian method is very sensitive to h , 

1. e., to the length of the interval over which the confidence band is to 
be constructed. If we wished to cover the entire range of x , then 

h = 0.55 } whence c = 1.0475 > d = 3*454 , 6 = 1 - 79 ^ ; and the region 
of nonsignificance is ( -0.886 <x < -0.110 ) . Thus h should be chosen 
just large enough to cover the x -interval of practical interest. 

It seems difficult to give a simple rule for the choice of the better 
procedure. It is not, of course, valid to calculate both intervals and 
choose the shorter, for the resulting interval will correspond to a 
confidence coefficient less than the required 100(1 - Ct)$. Gafarian 
finds the area of the fixed-width confidence band to be less than that of 
the Working-Hotelling band restricted to the finite interval when c >1*5 , 
but greater when c < 1.5 , for a = .05 . It does not immediately follow 
that the regions of significance will behave in the same way, but in the 
absence of further information such a rule might be considered. The 
above example lends some support to such a rule. 
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